refactor(parser): merge lexer with parser & optimize parser #679

fkleuver · 2018-04-24T03:33:57Z

This is a rewrite of the parser that replaces all string-based token comparisons with bitwise operators to speed things up, and also combines the lexer into the parser.

Parsing binary expressions is flattened into a single recursive function that takes operator precedence into account (drastically reduces unnecessary nested calls).

Functionality should be identical afaik. Error checking/reporting might not be - still need to check various error scenarios to ensure the strictness is the same. I wanted to gather some feedback first before I spend a bunch of time on that though :)

Performance improvement of around 200% depending on the type of expressions parsed. Simple benchmark available here.

Overall file size slightly reduced (70 bytes less unminified, I guess a little more minified because I added some comments)

@jdanyow @EisenbergEffect Is this viable you think?

EisenbergEffect · 2018-04-24T03:46:05Z

Wow! Very cool! @jdanyow and @bigopon Can you take a look at this?

bigopon · 2018-04-24T11:25:32Z

I like this, especially binary expressions parsing, it's clever (though I'd say old parser is easier to digest), and tests look nice. Checked out and tested on one of my projects. 1 or 2 places to pick for some more gains. It's @jdanyow @EisenbergEffect to decide what to do with Lexer and Token

fkleuver · 2018-04-24T11:38:01Z

@bigopon Thanks for taking a look!

1 or 2 places to pick for some more gains

Which ones?

I know numeric and unicode parsing can still be optimized quite a bit by processing the character code points and performing direct conversion, rather than using parseInt, parseFloat and regex. I could probably do the numeric parsing. Improving unicode is opening a pandora's box though. Then identifiers should also be taken along imo, and there's quite a bit of work there

Alexander-Taran · 2018-04-24T11:40:52Z

awesome "refactoring"

fkleuver · 2018-04-24T15:46:39Z

Updated number scanning now, for normal integers it will calculate them instead of parse which is about 15x faster. For floats there's no difference. Also removed the offset thing from error(). As far as performance goes that's pretty much all the low-hanging fruits now afaik.

fkleuver · 2018-04-26T20:39:33Z

My takeaway from the feedback so far is that it's a nice idea but the readability could be improved, so with the latest 2 commits I cleaned things up a bit.

Wrapped input.charCodeAt in currentChar getter (was peek in the Lexer)
Wrapped index++ in nextChar() function (was advance() in Lexer)

And a few more misc things where I reuse functions instead of inlining things.

The inlining didn't give nearly as big a performance boost as the switch from string tokens to bitwise tokens. Runtime of the mini benchmark went from ~116ms to ~124ms (compared to ~360ms without the changes) so I'd argue a few % performance loss is probably worth the gain in readability/maintainability.

Regarding splitting the parser/lexer, I kind of had to throw everything on one big pile first so I could see the whole picture. Main point was to parse while scanning (instead of scanning everything and then parsing) as this saves some memory and array operations.
The classes don't really need to be separate. If you guys prefer it, I could probably split them up again without without any significant performance loss. That's up to you guys now.

fkleuver · 2018-04-26T20:49:52Z

Side note, the runtime could probably also be sped up a little bit by using the numeric tokens instead of strings in the Binary.evaluate (potentially with a function lookup table instead of a switch). Downside is during debugging you see numbers instead of strings like '===' if you inspect the AST. Opinion on this @bigopon @jdanyow @EisenbergEffect ?

EisenbergEffect · 2018-04-26T22:43:32Z

I'm ok with having it as one class or splitting it. No real hard opinion from me. It it's easier to maintain together, then I think it's fine to keep it as one. I'd like to get another pass from @bigopon on this and maybe some feedback from @jdanyow as well, to make sure there aren't any missed edge cases that for some reason weren't covered under existing tests.

I'm pretty excited about this :)

jdanyow · 2018-04-28T15:42:16Z

This looks awesome- nice work 💪 🚀

My only feedback is around the benchmark- I'd like to see the test cases skewed more towards typical binding use cases, just to confirm we don't have a regression there (I doubt we will). Bindings using AccessScope and AccessMember expressions outnumber other types of expressions by a large factor in typical apps. Could you update the benchmark link to use at least 4 times as many simple expressions like fooA, fooB, fooC, ... , fooZ and foo.barA, foo.barB, etc?

fkleuver · 2018-04-28T22:17:06Z

@jdanyow Good point, I split up the individual parsing measurements so you can see where the differences are. I added a weight factor just in case, so the Access expressions make up the majority of all parses.

I noticed that my version seems to benefit a lot from specifically the way Chrome optimizes hot paths.

For the first run (page reload -> press test) the results fluctuate between 90% and 140%. There's a regression in parsing floats but that's to be expected - this is a tradeoff for speeding up parsing integers (which I think are far more common)

For subsequent runs (press test -> press test -> press test) the results stabilize around 180%. Doesn't really help with initial page loads, but would be interesting to see if things can be optimized beforehand with SSR / AOT by hinting the browser on where the hot paths are.

Relative results in Edge and FF are somewhat similar for first runs, though Edge is about 3x slower overall and FF about 2x slower. I guess it's mostly array access and switch statement related. Chrome really optimized those.

jdanyow

💯

jdanyow · 2018-04-28T23:28:50Z

src/parser.js

        args.push(this.parseExpression());
      }

-      result = new ValueConverter(result, name, args, [result].concat(args));
+      result = new ValueConverter(result, name, args);


jdanyow · 2018-04-28T23:29:30Z

src/parser.js

  constructor() {
-    this.cache = {};
-    this.lexer = new Lexer();
+    this.cache = Object.create(null);


jdanyow · 2018-04-28T23:35:41Z

@fkleuver what are your thoughts on #641? Is there a better way of handling that in the new Parser architecture? Could we handle it out of the box while keeping backwards compat and not adding too much bloat?

fkleuver · 2018-04-29T01:19:07Z

@jdanyow Unicode shouldn't be an issue as long as we stick to BMP. Without having to deal with surrogate pairs, it can be as simple as a lookup table (with all the valid identifier code points) in the default switch case in scanToken(). A lookup table can be generated based on unicode-10.0.0. I don't see any performance drawbacks there but it would increase the size of the build.

EisenbergEffect · 2018-04-29T02:15:00Z

How big is the lookup table?

…

Sent from my iPhone

On Apr 28, 2018, at 6:19 PM, fkleuver ***@***.***> wrote: @jdanyow Unicode shouldn't be an issue as long as we stick to BMP. Without having to deal with surrogate pairs, it can be as simple as a lookup table (with all the valid identifier code points) in the default switch case in scanToken(). A lookup table can be generated based on unicode-10.0.0. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fkleuver · 2018-04-29T02:43:08Z

@EisenbergEffect Approximately 10kb for identifier start and identifier part (compressed)

EisenbergEffect · 2018-04-29T03:16:54Z

I would prefer not to add something like that to the core. Do you think there is some way that it could hook in as a plugin for people who really want it?

fkleuver · 2018-04-29T03:53:56Z

Sure. Could add something like a ParserOptions class with a ScanUnicodeIdentifier flag/method that gets called if ScanToken falls through to the default case.

It doesn't necessarily have to be a standalone plugin btw. As long as the lookup is in a separate file, the build script could output an additional aurelia-binding-unicode file that includes it, then users can specifically include that version instead of the default if they want. I've seen @bigopon do some really neat stuff composing a bundle of only the stuff you need. If that works for templating, it should work for binding too :)

EisenbergEffect · 2018-04-29T06:22:56Z

The trick is that it needs to work with all build types, including require.js with our current CLI.

fkleuver · 2018-04-29T12:08:10Z

Hm, took a look at it and having an alternative dist/../aurelia-binding-unicode.js probably causes more problems than it solves due to the existing hard references in the tooling etc.

I added an initial idea to do this via a simple overridable extension point. Performance impact for the default implementation is negligible as it only does one extra call on the last character of an identifier.

Users could provide their own implementation and register it as IdentifierValidator themselves, or a plugin could do this. That's probably the best of both worlds.

I'm not 100% sure if this is the best name and/or form of encapsulation for the 2 methods. I was thinking it could be better to have a more generic ...Options/Extensions class injected into the parser, so the signature doesn't need to be changed if extra options need to be added in the future.

Thinking here of things like auto object creation (#205), debug logging (#30), globally adding a value converter (don't recall the issue).

What do you guys think? Have a single ...Options kind of class injected, or just extend with additional constructor arguments if extra stuff is needed in the future?

Or to keep the iteration size manageable, leave unicode out of scope for this PR and do that separately.

jdanyow · 2018-04-29T14:41:33Z

For sure- out of the scope of this PR, just wanted your thoughts on it. Thanks!

fkleuver · 2018-04-29T16:08:08Z

Alright, I removed that last commit.

To tie up any remaining loose ends, there are still 3 //todo's from the old parser that I wasn't sure what to do with.

// todo(kasperl): Is this really supposed to be expressions?
Can't think of any expression that wouldn't work as parameters to behaviors/converter. Maybe cover the various possibilities with tests and then remove the comment?
// todo(kasperl): Check that this is an identifier. Are keywords okay?
According to the spec they are not, but in practise every browser seems to be fine with identifiers like undefined, null, true, etc. Excluding those would be a breaking change so I'd say check that it's an identifier, ~~literal or indexed property~~ or a keyword. That should cover all working scenarios I think?
// todo(kasperl): Stricter checking. Only allow identifiers and strings as keys. Maybe also keywords?
Same as 2.

(edit: just realized literals and indexed properties are already covered by AccessKeyed)

…object literals explicit

fkleuver · 2018-04-29T22:24:13Z

Alright, with the latest commit the todo's are gone. Mind double checking that I addressed them in a desirable manner? Other than that I think I'm done :)

jdanyow · 2018-04-29T23:07:46Z

last two commits look good. I think the two kasperl todos were holdovers from the original angular dart source for the lexer/parser/ast.

fkleuver · 2018-04-30T07:18:26Z

Cool cheers. Let me know if you need anything else for ze merge.

bigopon · 2018-04-30T07:42:49Z

For extension, we can apply the same strategy like Unparser. But will be opt in instead of opt out

3cp · 2018-05-01T00:18:47Z

Damn hardcore! Just want to provide feedback that it all worked in my app.

fkleuver · 2018-05-01T00:33:17Z

Nice, thanks!

stsje · 2018-05-01T07:49:28Z

Will the performance improvement be noticeable in business applications?

fkleuver · 2018-05-01T11:23:27Z

@stsje Probably not. The framework is already hardly noticeable - most time is spent rendering, waiting for the network or executing app logic. Parsing makes up less than 1% of total execution time during the initial load (as do most other processes within Aurelia). The biggest performance improvements are in the hands of app developers.

It can make a difference when you're doing a lot of dynamically generating html and compiling it on-the-fly and/or using the computed observation adapter on many getters with large function bodies. That's rare though and definitely not the goal here.

Firstly I see this as a proof of concept for applying techniques that may be able to speed up other parts of the framework, slightly decrease initial load times, perform better in benchmarks and so on. Overall it's in the spirit of Aurelia to try and push performance as much as possible, this is just one small step.

Secondly I hope this opens the door for implementing a larger portion of the JavaScript syntax for binding to do more interesting things - with this performance gain, some room is created some to do other stuff which would otherwise not be possible without a performance loss.

jdanyow · 2018-05-03T12:39:53Z

@fkleuver I was getting ready to merge this- noticed a small issue with one of the patterns used in the parser tests that have try-catch logic- I think these changes need to be made to each try-catch style test:

  describe('does not parse LiteralObject with computed property', () => {
    const expressions = [
      '{ []: "foo" }',
      '{ [42]: "foo" }',
      '{ ["foo"]: "bar" }',
      '{ [foo]: "bar" }'
    ];

    for (const expr of expressions) {z
      it(expr, () => {
+       let error = null;
        try {
          parser.parse(expr);
-         pass = true;
        } catch (e) {
-         expect(e.message).toContain('Unexpected token [');
+         error = e;
        }
+       expect(error).not.toBeNull();
+       expect(error.message).toContain('Unexpected token [');
      });
    }
  });

otherwise if the parse doesn't throw as expected, the test will still pass.

…t assign tests

fkleuver · 2018-05-03T17:32:11Z

@jdanyow Glad you caught that, apparently there were a couple duplicate tests in there as well and two little issues with the invalid assignment tests. All should be sorted now.

jdanyow · 2018-05-05T15:51:30Z

nice work @fkleuver!

Alexander-Taran · 2018-05-05T23:01:06Z

awesome !!!!11

fkleuver force-pushed the new-parser branch 2 times, most recently from 3360a81 to 51f7956 Compare April 24, 2018 03:39

EisenbergEffect requested review from jdanyow and bigopon April 24, 2018 03:48

refactor(parser): merge lexer with parser & optimize parser

199e150

fkleuver force-pushed the new-parser branch from 51f7956 to 199e150 Compare April 24, 2018 04:10

fkleuver added 4 commits April 24, 2018 12:17

refactor(value-converter): set AllArgs inside constructor

b91a0e2

test(parser): cleanup / add more tests

c60e240

fix(parser): make unicode escape case insensitive

384aeee

fix(parser): check bounds for unicode escapes

4d0c97d

fkleuver added 4 commits April 24, 2018 14:29

tests(parser): add unit tests for binary expression precedence

d7fb6d5

tests(parser): add unit tests for parsing numbers

8048949

docs(parser): add documentation link for operator precedence

54c85b3

refactor(parser): optimize number scanning

01beaf3

refactor(parser): clean-up, improve readability

eec9b10

refactor(parser): align token names with char names

61033c2

fkleuver added 3 commits April 27, 2018 08:28

docs(parser): fix typo

e5ea180

refactor(parser): minor tweak to currentChar

59f34e8

chore(parser): fix formatting

fdbb497

refactor(parser): improve decimal parsing

bb185e3

jdanyow approved these changes Apr 28, 2018

View reviewed changes

refactor(parser): improve/test error reporting

2ffeb84

fkleuver force-pushed the new-parser branch from 033948b to 2ffeb84 Compare April 29, 2018 15:16

fix(parser): ensure property accessors are identifiers, make invalid …

d31ee61

…object literals explicit

bigopon approved these changes Apr 30, 2018

View reviewed changes

tests(parser): remove duplicate tests, fix error verification, correc…

1e2270f

…t assign tests

jdanyow merged commit fd6083f into aurelia:master May 5, 2018

jawa-the-hutt mentioned this pull request Jul 16, 2018

Removed Lexer() from template-gatherer due to aurelia-binding 2.0.0 aurelia-ui-toolkits/aurelia-kendoui-bridge#790

Merged

refactor(parser): merge lexer with parser & optimize parser #679

refactor(parser): merge lexer with parser & optimize parser #679

Conversation

fkleuver commented Apr 24, 2018 • edited Loading

EisenbergEffect commented Apr 24, 2018

bigopon commented Apr 24, 2018

fkleuver commented Apr 24, 2018 • edited Loading

Alexander-Taran commented Apr 24, 2018

fkleuver commented Apr 24, 2018

fkleuver commented Apr 26, 2018 • edited Loading

fkleuver commented Apr 26, 2018

EisenbergEffect commented Apr 26, 2018

jdanyow commented Apr 28, 2018 • edited Loading

fkleuver commented Apr 28, 2018 • edited Loading

jdanyow left a comment

Choose a reason for hiding this comment

jdanyow Apr 28, 2018

Choose a reason for hiding this comment

jdanyow Apr 28, 2018

Choose a reason for hiding this comment

jdanyow commented Apr 28, 2018

fkleuver commented Apr 29, 2018 • edited Loading

EisenbergEffect commented Apr 29, 2018 via email

fkleuver commented Apr 29, 2018

EisenbergEffect commented Apr 29, 2018

fkleuver commented Apr 29, 2018 • edited Loading

EisenbergEffect commented Apr 29, 2018

fkleuver commented Apr 29, 2018 • edited Loading

jdanyow commented Apr 29, 2018

fkleuver commented Apr 29, 2018 • edited Loading

fkleuver commented Apr 29, 2018

jdanyow commented Apr 29, 2018

fkleuver commented Apr 30, 2018

bigopon commented Apr 30, 2018

3cp commented May 1, 2018

fkleuver commented May 1, 2018

stsje commented May 1, 2018

fkleuver commented May 1, 2018 • edited Loading

jdanyow commented May 3, 2018

fkleuver commented May 3, 2018

jdanyow commented May 5, 2018

Alexander-Taran commented May 5, 2018

fkleuver commented Apr 24, 2018 •

edited

Loading

fkleuver commented Apr 24, 2018 •

edited

Loading

fkleuver commented Apr 26, 2018 •

edited

Loading

jdanyow commented Apr 28, 2018 •

edited

Loading

fkleuver commented Apr 28, 2018 •

edited

Loading

fkleuver commented Apr 29, 2018 •

edited

Loading

fkleuver commented Apr 29, 2018 •

edited

Loading

fkleuver commented Apr 29, 2018 •

edited

Loading

fkleuver commented Apr 29, 2018 •

edited

Loading

fkleuver commented May 1, 2018 •

edited

Loading